Ensemble Gene Selection Versus Single Gene Selection: Which Is Better?
نویسندگان
چکیده
One of the major challenges in bioinformatics is selecting the appropriate genes for a given problem, and moreover, choosing the best gene selection technique for this task. Many such techniques have been developed, each with its own characteristics and complexities. Recently, some works have addressed this by introducing ensemble gene selection, which is the process of performing multiple runs of gene selection and aggregating the results into a single final list. The question is, will ensemble gene selection improve the results over those obtained when using single gene selection techniques (e.g., filter-based gene selection techniques on their own without any ensemble approach)? We compare how five filter-based feature (gene) selection techniques work with and without a data diversity ensemble approach (using a single feature selection technique on multiple sampled datasets created from an original one) when used for building models to label cancerous cells (or predict cancer treatment response) based on gene expression levels. Eleven bioinformatics (gene microarray) datasets are employed, along with four feature subset sizes and five learners. Our results show that the techniques Fold Change Ratio and Information Gain will produce better classification results when an ensemble approach is applied, while Probability Ratio and Signal-to-Noise will, in general, perform better without the ensemble approach. For the Area Under the ROC (Receiver Operating Characteristics) Curve ranker, the classification results are similar with or without the ensemble approach. This is, to our knowledge, the first paper to comprehensively examine the difference between the ensemble and single approaches for gene selection in the biomedical and bioinformatics domains.
منابع مشابه
Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملConstruction of T-vector derived from pBluescript ΙΙ SK with a positive selection marker, a rapid system for cloning
A rapid DNA cloning system is a research interest of many scientists. TA cloning is one of the methods used for the cloning of PCR-amplified DNA molecules. The TA cloning method is a convenient and labor-saving replacement to traditional, restriction enzyme-mediated cloning strategies. A T-vector called pBlueskript ΙΙ SK-1 with the lethal gene ccdB was designed to construct a positive selection...
متن کاملEnsembles of Nearest Neighbours for Cancer Classification Using Gene Expression Data
It is known that an ensemble of classifiers can outperform a single best classifier if classifiers in the ensemble are sufficiently diverse (i.e., their errors are as much uncorrelated as possible) and accurate. We study ensembles of nearest neighbours for cancer classification based on gene expression data. Such ensembles have been rarely used, because the traditional ensemble methods such as ...
متن کاملIntroduction of Three Independent Selection Markers in Leishmania
The pLE2SCX vector was developed for the stable expression of exogenous genes in the protozoan parasite Leishmania. The pLE2SCX construct contains three independent selection markers: herpes simplex virus thymidine kinase (HSV-TK), cytosine deaminase (CD) and streptothericin acetyltransferase gene (sat) in multiple cloning site, flanking by 5’ and 3’ untranslated regions of the previously clone...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013